Cross validation in LASSO and its acceleration

نویسندگان

  • Tomoyuki Obuchi
  • Yoshiyuki Kabashima
چکیده

We investigate leave-one-out cross validation (CV) as a determinator of the weight of the penalty term in the least absolute shrinkage and selection operator (LASSO). First, on the basis of the message passing algorithm and a perturbative discussion assuming that the number of observations is sufficiently large, we provide simple formulas for approximately assessing two types of CV errors, which enable us to significantly reduce the necessary cost of computation. These formulas also provide a simple connection of the CV errors to the residual sums of squares between the reconstructed and the given measurements. Second, on the basis of this finding, we analytically evaluate the CV errors when the design matrix is given as a simple random matrix in the large size limit by using the replica method. Finally, these results are compared with those of numerical simulations on finite-size systems and are confirmed to be correct. We also apply the simple formulas of the first type of CV error to an actual dataset of the supernovae.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The lasso, persistence, and cross-validation

During the last fifteen years, the lasso procedure has been the target of a substantial amount of theoretical and applied research. Correspondingly, many results are known about its behavior for a fixed or optimally chosen smoothing parameter (given up to unknown constants). Much less, however, is known about the lasso’s behavior when the smoothing parameter is chosen in a data dependent way. T...

متن کامل

Bootstrap-based Penalty Choice for the Lasso, Achieving Oracle Performance

In theory, if penalty parameters are chosen appropriately then the lasso can eliminate unnecessary variables in prediction problems, and improve the performance of predictors based on the variables that remain. However, standard methods for tuning-parameter choice, for example techniques based on the bootstrap or cross-validation, are not sufficiently accurate to achieve this level of precision...

متن کامل

The Spike-and-Slab LASSO

Despite the wide adoption of spike-and-slab methodology for Bayesian variable selection, its potential for penalized likelihood estimation has largely been overlooked. In this paper, we bridge this gap by cross-fertilizing these two paradigms with the Spike-and-Slab LASSO procedure for variable selection and parameter estimation in linear regression. We introduce a new class of self-adaptive pe...

متن کامل

Penalized Lasso Methods in Health Data: application to trauma and influenza data of Kerman

Background: Two main issues that challenge model building are number of Events Per Variable and multicollinearity among exploratory variables. Our aim is to review statistical methods that tackle these issues with emphasize on penalized Lasso regression model.  The present study aimed to explain problems of traditional regressions due to small sample size and m...

متن کامل

Oracle inequalities for cross-validation type procedures

Abstract We prove oracle inequalities for three different type of adaptation procedures inspired by cross-validation and aggregation. These procedures are then applied to the construction of Lasso estimators and aggregation with exponential weights with data-driven regularization and temperature parameters, respectively. We also prove oracle inequalities for the crossvalidation procedure itself...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • CoRR

دوره abs/1601.00881  شماره 

صفحات  -

تاریخ انتشار 2015